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"DRIFT-FREE VIDEO ENCODING AND DECODING METHOD" 

FIELD OF THE INVENTION 

The present invention relates to an encoding method for the compression of 
an original video sequence divided into successive groups of frames (GOFs), and to a 
corresponding video decoding method. 

BACKGROUND OF THE INVENTION 

Current video standards (from MPEG-1 to H.26 L) often use so-called hybrid 
solutions based on a predictive scheme where each frame is either intra coded (I 
frames) or temporally predicted from a given reference frame (the prediction options 
being, as shown in Rg.l, a forward prediction, for the P frames, or a bi-directional 
prediction, for the B frames), the prediction error thus obtained being then spatially 
transformed (a 2D-DCT transform is used in the standard schemes) to get advantage 
of spatial redundancies. According to a different approach proposed in the document 
'Three-dimensional subband coding of video", CPodilchuk and al., IEEE Transactions 
on Image Processing, vol.4, n°2, February 1995, pp.125-139, a group of frames (GOF) 
Is processed as a three-dimensional (2D+t, or 3D) structure and spatio-temporally 
filtered in order to compact the energy in the low frequencies (further studies included 
Motion Compensation in this scheme in order to improve the overall coding efficiency). 
The obtained 3D subband structure is depicted in Fig.2. The well known SPIHT 
algorithm, extended from 2D to 3D, was then used in order to efficiently encode the 
final coefficient bit-planes with respect to the spatio-temporal decomposition structure. 

According to the usual implementation of a 3D subband codec, a 
motion-compensated (MC) spatio-temporal analysis is applied at the full original 
resolution, spatial scalability being then achieved by getting rid of the portions of the 
bitstream corresponding to the highest spatial subbands of the decomposition. When a 
motion compensation is used in the 3D analysis scheme, this method does not allow a 
perfect reconstruction of the video sequence at lower resolution, even with an infinite 
bit-rate. This phenomena will be referred to as drift in the following description. As 
explained in the document "Multiscale video compression using wavelet transform and 
motion compensation", P.Y.Cheng and al., Proceedings of the International 
Conference on Image Processing (IQP95), Vol.1, 1995, pp.606-609, said drift comes 
from the order of wavelet transform and motion compensation that is not 
Interchangeable. When a frame (A) is synthesized at a lower resolution (a), the 
following operation is applied : 

a = DWT L (L) + MC[DWT t (H)] 
= DWT U (A) + [MCrpWT,. (H)] - DWT L (MC[H])] (1) 



where DWT L denotes the resolution downsample using the same wavelet filters as In 
the 3D analysis. In a perfect scalable solution, one wants to have: 

a - DWT L (A) ( 2 ) 
The remaining part of the expression (1) therefore corresponds to the drift. It can be 
noticed that, if no MC is awtfearwmflTSTerTO — 
(except at the image borders) if a unique motion vector is applied to the frame. Yet, it 
Is known that MC is unavoidable to achieve a good coding efficiency, and the 
likelihood of a unique global motion Is small enough to eliminate this particular case in 

the following paragraphs. 

Some authors, such as J.W.Woods and al in the document *A resolution and 
frame-rate scalable subband/wavelet video coder", IEEE Transactions on Circuits and 
Systems for Video Technology, vol.1, n°9, September 2001, pp.1035-1044, have 
already proposed technical solutions in order to get rid of this drift. However, in said 
document, the described scheme, in addition to being quite complex, implies the 
sending of an extra Information (the drift correction necessary to correctiy synthesize 

the upper resolution) in the bitstream, thus wasting some bits. The solution described 

in the document "Multiscale video compression..." previously cited avoids this.. 

bottleneck but works on a predictive scheme and is not transposable to the 3D 

subband codec. 

It has then been proposed, in the European patent application n°02290155.7 
(PHFR020002) filed on January 22 nd , 2002, a solution avoiding these drawbacks. 
According to that solution, the video encoding method, used for the compression of an 
original video sequence divided into successive groups of frames (GOFs), comprised 
the steps of v - -• 

(1) generating from the original video sequence, by means of a wavelet 
decomposition, a low resolution sequence including successive low resolution GOFs ; 

(2) performing on said low resolution sequence a low resolution 
decomposition, by means of a motion compensated spatio-temporal analysis of each 

low resolution GOF ; 

(3) generating from said low resolution decomposition a full resolution 
sequence, by means of an anchoring of the high frequency spatial subbands resulting 
from the wavelet decomposition to said low resolution decomposition ; 

(4) coding said full resolution sequence and the motion vectors generated 
during the motion compensated spatio-temporal analysis, for generating an output 
coded bitstream. 

Said solution, in which the global structure of the decomposition tree In the 
3DS analysis is preserved and no extra information is sent to correct the drift effect 
(only the decomposition/reconstruction mechanism is changed), is now recalled In a 
more detailed manner with reference to the coding scheme of Fig.3. 




Two main steps are provided : (a) a motion compensation step at the lowest 
resolution, (b) an encoding step of the high spatial subbands. First, in order to avoid 
drift at lower resolutions, Motion Compensation (MC) was applied at this level. 
Consequently one first downsizes the GO.F using the wavelet filters. Then the usual 3D 
subband MC-decomposition scheme is applied to this downsized GOF (it may be 
noticed that a side effect of this method is the reduction of the amount of motion 
vectors to be sent in the bitstream if the block size of the MC is the same as in a full- 
resolution process, which saves up some bits for texture coding). Before transmitting 
the subbands to a tree-based entropy coder (for instance to a 3D-SPIHT encoder such 
as described in the document "Low bit-rate scalable video coding with 3D set 
partitioning in hierarchical trees (3D-SPIHT)", B J. Kim and al. IEEE Transactions on 
Circuits and Systems for Video Technology, vol.10, n°8, December 2000, pp.1374- 
1387), one puts the high spatial subbands that allow the reconstruction of the full 
resolution. The final tree structure looks very similar to that of a 3D subband codec 
such as the one described in the document tt A fully scalable 3D subband video codec", 
V.Bottreau and al. Proceeding of IEEE Conference on Image Processing (ICIP2001), 
vol.2, pp.1017-1020, Thessaloniki, Greece, October 7-10, 2001, and so a tree-based 
entropy coder can be applied on it without any restriction. 

Concerning the second step of coding the high spatial subbands, two main 
solutions are proposed, the first one without MC, and the second one with MC. 

In the first solution, the high subbands simply correspond to the high 
frequency spatial subbands of the original (full resolution) frames of the GOF in the 
wavelet decomposition. Those subbands allow the reconstruction at foil resolution at 
the decoding side. Indeed, the frames can be decoded at the low resolution. However, 
these frames correspond to the low spatial subband in the wavelet analysis of the 
original frames. Hence one has merely to put the low resolution frames and the 
corresponding high subbands together and apply a wavelet synthesis to obtain the full 
resolution frames, and thus to optimize the 3D-SPIHT encoder. In a MC scheme for a 
3D subband encoder, the low temporal subbands always look like one of the original 
frames of the GOF. As a matter of fact : 

L= ^2 [A + MC(B *J (3) 
so L looks like A. Consequently, the high spatial subband of A should be placed with 
the low resolution decomposition corresponding to L This approach (reordering of the 
high spatial subbands in the case of forward MC) is illustrated in Fig.4, where jt 
indicates the temporal decomposition level (0 for the full-frame rate, jt_max for the 
lowest frame rate), nf is the subband index at the temporal level jt, DWT H denotes the 
high frequency wavelet filter and the coefficients c, t are multiplication coefficients. 




In the second solution, as using NIC in every subband does not ailow a 
reconstruction with no drift, it is also possible to partially use MC to construct the high 
spatial subbands and still be able to reconstruct every resolution. Instead of directly using 
the high frequency spatial subbands of the wavelet decomposition, a wavelet 
-aeoompoaaofTBOTIBtfT^^ 

full resolution sequence and reusing for instance the motion vectors of the low resolution. 

SUMMARY OF THE INVENTION 

It is then an object of the invention to improve the previously described 
solution by keeping its good behaviour at low resolution while getting closer to the 
performance of a classic 3D subband codec at full resolution. 

To this end," the invention relates to a video encoding method for the 
compression of an original video sequence divided into successive groups of frames 
(GOFs), said method comprising the steps of : 

(1) generating from the original video sequence, by means of a wavelet 
decomposition, a set of low resolution frames organized in successive low resolution 
GOFs; 

(2) performing on said low resolution frames a motion compensated spatio- 
temporal analysis, leading to a low resolution sequence ; 

(3) performing a motion compensated spatio-temporal analysis of each full 
resolution GOF of the original video sequence ; 

(4) replacing at each temporal decomposition level the low-frequency 
subbands of said decomposition by the corresponding spatio-temporal subbands of the 
low resolution sequence ; 

(5) coding the modified sequence thus obtained and the motion vectors 
generated during the motion compensated spatio-temporal analysis of each full 
resolution GOF, for generating an output coded bitstream. 

The invention also relates to a corresponding decoding method. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The invention will now be described in a more detailed manner, with 

reference to the accompanying drawings in which : 

- Fig.l Illustrates the different predictions in a typical hybrid video encoding 

scheme ; 

- FIg.2 shows a 3D subband decomposition ; 

- Rg.3 depicts an embodiment of an encoding scheme according to a 

previous embodiment ; 

- Rg.4 illustrates the reordering of the high spatial subbands (for a forward 

motion compensation) ; 
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- Fig.5 Illustrates the main steps of the encoding method according to the 
invention ; 

- Flg.6 Illustrates the corresponding motion compensated temporal filtering 
decomposition scheme ; 

- Fig.7 Illustrates at the decoding side an implementation of a synthesis 
scheme corresponding to the encoding method of Fig.5, 

DETAILED DESCRIPTION OF THE INVENTION 

As for the previously described solution, the present invention is now 
explained with reference to its basic steps : (a) motion compensation at the lowest 
resolution (this first step, Motion Compensation (MC), is, in fact, strictly equivalent to 
the one described in the case of the previous solution ; one first downsizes the GOF 
using the spatial wavelet filters, and the usual 3D subband MOdecompositfon scheme 
is then applied to this downsized GOF), (b) encoding the high spatial subbands. 

The main difference with said previous solution lies in the second step, the 
principle of which is to inject at each decomposition level the temporal subbands of 
the low spatial resolution analysis Into those of the full-resolution one. It is thus 
possible to reconstruct the original frames at the decoder side while performing a real 
temporal filtering (and not just an intra coding or a predicUve difference - as in the 
previous solution - for the high frequency spatial subbands). 

The following equations explain the mechanism in a more detailed manner. 
As said above, the first temporal analysis is performed at low resolution, which may be 
expressed by the equations (4) and (5) : 



with the following notations: 
A = reference frame 
B = current frame 
DWT = discrete wavelet transform 

Aa = low-frequency spatial subband of the DWT of frame A, I.e. a low-spatial 
resolution version of frame A 

B d = low-frequency spatial subband of the DWT of frame B, i.e a low-spatial 
resolution version of frame B 

H = high-frequency temporal subband at the low spatial resolution 
L = low-frequency temporal subband at the low spatial resolution 
MQown = motion compensation performed on low-resolution (i.e. sub- 
sampled) frames. 



H d = [B d -MCd 0Wn (Ad)]/V2 

U^CVI^Ad+MCctown'HHd)] 



(4) 



(5) 




MC 1 = inverse motion compensation (motion vectors computed to predict a 
frame B from a frame A are reversely used to predict the frame A from the frame B). 
The equations (6) to (9) then allow to define U and H s : 

H* = B- MQ^A) (6) 

u^-jzzK+mzsFm tn 

H S = H' (8) 
U = 72.f (9) 

with : 

X s = union of the three high-frequency spatial subbands of the DWT of a 
given frame X (with Xg = H s or U) 

MQua= motion compensation performed on full-resolution frames 

L' and H' = respectively the low-frequency and high-frequency temporal 
subbands in a conventional 3D subband scheme 

H = DWT 1 [H d u Hs] 

L = DWT 1 [L<j <J LJ 

Once all the low-frequency and high-frequency temporal subbands have 
been generated at a given temporal level jt, both at low and full spatial resolutions, 
the low-frequency temporal subbands L are further decomposed to achieve the next 

temporal level jt+1. 

This is repeated at each step of the temporal decomposition, leading finally 
to a structure of the temporal decomposition which Is very similar to that of a classic 
3D subband encoder. The low frequency temporal subband of the last level and the 
high frequency temporal subbands of all levels are then spatially decomposed through 
wavelet filters and encoded to form the bitstream. 

The described invention keeps the good behaviour of the previous solution at 
low resolution while getting closer to the performance of a classic 3D subband codec 
at full resolution (the global structure of the decomposition tree in the 3D subband 
analysis is preserved and no extra information Is sent to correct the drift effect ; only 
the decomposition/reconstruction mechanism is changed). The main upgrade comes 
from the new approach to generate the high-frequency spatial subbands, that brings 
more coherence to the decomposition tree and therefore improves the coding 

efficiency of the system. 

At the decoder, all the previous equations can be reverted to allow a good 
reconstruction. Only a A is added to every subband in order to indicate that decoding 
is now concerned and that some information might have been lost. First a classic 3D 
subband synthesis at low resolution allows to give back the low spatial resolution 
subbands A<» and B d from U and Ha : 



^ d = "^r^ d ~ MCdown_1 ^ d ^ (io) 

Bd=MC d0Wn (A d )+V2.H d (u) 

It Is also easy to get As by synthesizing H and by reverting the equation (7). The 
process is explained by the equations (12) to (15) : 

H^DWT^^uHs] (12) 
L-DWT-HCa uL s ] (13) 

A "s = ^[L-MC ftJl r 1 (H)] (14 ) 
As - A" s (15) 

Then A Is simply reconstructed from A d and A s . Consequently one can get B s and 
finally synthesize B. This is summarized by the system of equations (16) to (19) : 
A = DWT- 1 [A d uA s ] (16) 

B^MCfu.^ + H (17) 

Bs=B% (1 8) 

B=DWT- 1 [B d ^B s ] (19) 



These operations are repeated until the very first temporal level, i.e. until the GOF is 
fully decoded. It can clearly be seen that this scheme generates no drift since perfect 
reconstruction is achieved as soon as L and H are completely transmitted in the bit- 
stream (it can also be noted that the full spatial resolution synthesis is now intimately 
linked with the low resolution one at each temporal level, which was not the case in 
die previous solution). 

The encoding principle defined above will now be described in a more 
detailed manner, with reference to Fig.5, that illustrates the main steps of the 
encoding method, and Rg.6, that illustrates the corresponding motion compensated 
temporal filtering scheme. 

In the encoding scheme of Fig.5, the original group of frames (this current 
GOF comprises full resolution frames) is first used for generating, by means of a 
wavelet decomposition, low resolution frames on which a motion compensated spatio- 
temporal analysis is then performed. A low resolution sequence is thus obtained. The 
original full resolution frames (I.e. each full resolution GOF) is also used for performing 
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a motion compensated spatio-temporal analysis (the corresponding successive steps 
are designated by : "MC-temporal analysis" and "wavelet decomposition"). 

After these two parallel sets of steps performed on the full resolution frames, 
the low frequency subbands of the decomposition thus obtained are iteratively 
replaced, at each tempora raS5Bffip^B«^^ 

subbands of the low resolution sequence, according to the following operations : 

(a) first, a storing operation, for storing the high frequency spatio-temporal 
subbands of the decomposition in view of the final encoding step ; 

(b) then a wavelet synthesis, performed from the low frequency spatio- 
temporal subbands of said decomposition ; 

(c) then a test concerning the rank of the temporal decomposition level, for 
storing the low frequency spatio-temporal subbands of the decomposition if said level 
is the last one, the two parallel sets of steps being on the contrary further carried, out 
for the next temporal level If said level is not the last one. 

More detailed representations of the whole decomposition scheme and the 
corresponding motion-compensated synthesis scheme at the decoding side can be 
seen in Rg.6 and Fig.7 respectively. This example of a spatio-temporal decomposition 
according to the invention is related to a GOF of only four frames AO to A3 (for the 
sake of simplicity), with a forward motion compensation and two decomposition levels. 
The high and low frequency (H'„, H\ and L\>, L' t respectively) temporal subbands are 
computed from the original frames by using the so-called lifting scheme, described for 
instance In the document "Factoring wavelet transforms into lifting steps", 
I.Daubechies and W.Sweldens, Bell Laboratories technical report, Lucent Technologies, 
1996. The notations DWT and DWT* respectively designate the wavelet 
decomposition and the wavelet synthesis. The right side of Fig.6 illustrates 
successively the first spatio-temporal decomposition level, the inverse synthesis 
applied to the low frequency spatio-temporal subbands of the decomposition and the 
second spatio-temporal decomposition level (performed after the replacement of the 
low frequency subbands of the decomposition by the corresponding spatio-temporal 
subbands of the low resolution sequence, said replacement being indicated by the 
arrows coming from the left side of Rg.6). 



CLAIMS : 

1. A video encoding method for the compression of an original video sequence 
divided into successive groups of frames (GOFs), said method comprising the steps 
of: 

5 (1) generating from the original video sequence, by means of a wavelet 

decomposition, a set of low resolution frames organized in successive low resolution 
GOFs; 

(2) performing on said low resolution frames a motion compensated spatio- 
temporal analysis, leading to a low resolution sequence ; 

10 (3) performing a motion compensated spatio-temporal analysis of each full 

resolution GOF of the original video sequence ; 

(4) replacing at each temporal decomposition level the low-frequency 
subbands of said decomposition by the corresponding spatio-temporal subbands of the 
low resolution sequence ; 

15 (5) coding the modified sequence thus obtained and the motion vectors 

generated during the motion compensated spatio-temporal analysis of each full 
resolution GOF, for generating an output coded bitstream. 

2. A video decoding method, provided for decoding a coded bitstream 
corresponding to a video sequence coded by means of a video encoding method 

20 comprising, for the compression of said original video sequence, the steps of : 

(1) generating from the original video sequence, by means of a wavelet 
decomposition, a set of low resolution frames organized in successive low resolution 
GOFs ; 

(2) performing on said low resolution frames a motion compensated spatio- 
25 temporal analysis, leading to a low resolution sequence ; 

(3) performing a motion compensated spatio-temporal analysis of each full 
resolution GOF of the original video sequence ; 

(4) replacing at each temporal decomposition level the low-frequency 
subbands of said decomposition by the corresponding spatio-temporal subbands of the 

30 low resolution sequence ; 

(5) coding the modified sequence thus obtained and the motion vectors 
generated during the motion compensated spatio-temporal analysis, of each full 
resolution GOF, for generating an output coded bitstream ; 

said video decoding method comprising the steps illustrated In the MC temporal 
35 synthesis scheme of Fig.7. 




Abstract 

Three-dimensional (3D) subband coding schemes use motion compensation In 
their temporal filtering stage. Unfortunately, this procedure Introduces two 
drawbacks : (a) the MC being applied at the full resolution, a drift appears when 
decoding at a lower resolution, and (b) all the motion vectors estimated at full 
resolution are transmitted, which Is a waste of bits. According to the invention, a low 
resolution sequence is first obtained by generating from the original input sequence of 
frames -by means of a wavelet decomposition- low resolution frames and performing 
on them a motion compensated spatio-temporal analysis. Then, a motion 
compensated spatio-temporal analysis of each full resolution group of frames is 
performed, and the low frequency subbands of the decomposition are finally replaced, 
at each temporal decomposition level, by the corresponding spatio-temporal subbands 
of the generated low resolution sequence. The modified sequence thus obtained is 
finally coded. Thanks to this approach, a good behaviour at low resolution is 
maintained (no more drift) while getting closer to the performance of a classic 3D 
subband codec at full resolution. 
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